Value Iteration is Optic Composition
نویسندگان
چکیده
Dynamic programming is a class of algorithms used to compute optimal control policies for Markov decision processes. ubiquitous in theory, and also the foundation reinforcement learning. In this paper, we show that value improvement, one main steps dynamic programming, can be naturally seen as composition category optics, intuitively, function limit chain optic compositions. We illustrate with three classic examples: gridworld, inverted pendulum savings problem. This first step towards complete account learning terms parametrised optics.
منابع مشابه
Factored Value Iteration Converges
In this paper we propose a novel algorithm, factored value iteration (FVI), for the approximate solution of factored Markov decision processes (fMDPs). The traditional approximate value iteration algorithm is modified in two ways. For one, the least-squares projection operator is modified so that it does not increase max-norm, and thus preserves convergence. The other modification is that we un...
متن کاملValue Pursuit Iteration
Value Pursuit Iteration (VPI) is an approximate value iteration algorithm that finds a close to optimal policy for reinforcement learning problems with large state spaces. VPI has two main features: First, it is a nonparametric algorithm that finds a good sparse approximation of the optimal value function given a dictionary of features. The algorithm is almost insensitive to the number of irrel...
متن کاملExternal Memory Value Iteration
We propose a unified approach to disk-based search for deterministic, non-deterministic, and probabilistic (MDP) settings. We provide the design of an external Value Iteration algorithm that performs at most O(lG · scan(|E|) + tmax · sort(|E|)) I/Os, where lG is the length of the largest back-edge in the breadth-first search graph G having |E| edges, tmax is the maximum number of iterations, an...
متن کاملValue Iteration Networks
We introduce the value iteration network (VIN): a fully differentiable neural network with a ‘planning module’ embedded within. VINs can learn to plan, and are suitable for predicting outcomes that involve planning-based reasoning, such as policies for reinforcement learning. Key to our approach is a novel differentiable approximation of the value-iteration algorithm, which can be represented a...
متن کاملFocused Topological Value Iteration
Topological value iteration (TVI) is an effective algorithm for solving Markov decision processes (MDPs) optimally, which 1) divides an MDP into strongly-connected components, and 2) solves these components sequentially. Yet, TVI’s usefulness tends to degrade if an MDP has large components, because the cost of the division process isn’t offset by gains during solution. This paper presents a new...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Electronic proceedings in theoretical computer science
سال: 2023
ISSN: ['2075-2180']
DOI: https://doi.org/10.4204/eptcs.380.24